TASK 1

  • The first task was on performing analyses to help the client understand why Emmersion’s Speaking and Writing assessments could be used in place of the human rated Speaking and Writing assessments.

1. Data Checking

Check Missing Values

First I started with selecting variables that are relevant for the task 1. Those variables included examinee ID’s, four Emmersion English assessment scores on speaking, writing, reading, and listening, and two human rated scores on speaking and writing, and one combined score, and one summer level score which is used to suggest next level of English class. The final data consisted with 9 variables and 127 observations and there were no missing values in the data.

2. Descriptive Statistics

Scores Before Standardizing

type n.students Mean SD Min Max Range Skewness
HRSPEAK 127 4.24 1.23 1.23 6.60 5.37 -0.25
EMSPEAKING 127 6.16 1.13 3.90 9.10 5.20 0.23
HRWRITE 127 4.21 1.13 1.19 6.70 5.51 -0.19
EMWRITING 127 5.37 1.51 2.20 10.00 7.80 0.23
EMREADING 127 602.42 105.46 303.00 729.00 426.00 -0.78
EMLISTENING 127 469.49 113.49 144.00 657.00 513.00 -0.49
CombinedPlacementTestBattery 127 4.25 1.07 1.18 5.97 4.79 -0.73
SummerLEVEL 127 4.19 1.01 1.00 6.00 5.00 -0.60

Next, I looked at descriptive statistics of each score. As you can see in this table, the Emmersion reading and listening scores are on a very different scale compared to the rest of the scores. So I standardized all eight score variables in order to make these scores being on a same scale and to make them comparable.

One thing to note here is that standardizing doesn’t change any of the shape of the original distribution. These two plots show exactly what happens after standardizing – the distribution stays the same but only the scale on the x-axis changes.

Example

Scores After Standardizing

type n.students Mean SD Min Max Range Skewness
HRSPEAK 127 0 1 -2.45 1.92 4.37 -0.25
EMSPEAKING 127 0 1 -2.00 2.61 4.61 0.23
HRWRITE 127 0 1 -2.68 2.21 4.89 -0.19
EMWRITING 127 0 1 -2.10 3.06 5.16 0.23
EMREADING 127 0 1 -2.84 1.20 4.04 -0.78
EMLISTENING 127 0 1 -2.87 1.65 4.52 -0.49
CombinedPlacementTestBattery 127 0 1 -2.87 1.59 4.46 -0.73
SummerLEVEL 127 0 1 -3.16 1.80 4.96 -0.60

This table now shows the descriptive statistics of each standardized score, with mean \(0\) and standard deviation \(1\). You could see that the range of the scores are pretty similar and none of the scores are highly skewed since all the skewness values fall between -1 and 1.

Comparison between EMSPEAKING and HRSPEAK

Next I checked with the distributions of the human rated speaking score and the Emmersion speaking score. You could see that the Emmersion score curve is more peaked around zero meaning that more exminees are around the mean. The red area here shows that there are more examinees around higher range and lower range of the human rated speaking score. This may indicate that human raters are more likely to give more extreme scores.

Comparison between EMWRITING and HRWRITE

Similar distribution shapes were found for writing scores.

Comparision between Combined Score and EM Total Score

Descriptives of Total Scores

type n.students Mean SD Min Max Range Skewness
CombinedPlacementTestBattery 127 0 1.00 -2.87 1.59 4.46 -0.73
EM.total.score 127 0 0.83 -2.29 2.10 4.39 -0.33

In order to understand the relationship between the Emmersion speaking and writing scores and the total score, I wanted to compute Emmersion total score. First I examined the combined score provided in the dataset and found that this combined score is almost perfectly correlated with the average score of the human rated speaking and writing scores and Emmersion reading and listening scores. So next I averaged Emmersion speaking, writing, reading, and listening scores and created a Emmersion total score. Again, both combined score and Emmersion total score are standardized.

Density Plots

Here you could see that the Emmersion total score is more normally distributed and less skewed meaning that there are more examinees distributed around the mean and less extreme scores were reported. Combined score distribution is more skewed and has less examinees around the mean. This indicates that human raters seem to give more extreme scores, that are either too high or too low, compared to the Emmersion’s automated adaptive tests.

3. Spearman’s Rank-Order Correlation

Next I ran Spearman’s rank order correlation test to evaluate the rank order of the two scores. In this test, the null hypothesis is that there is no monotonic association between the two scores in the population.

All tests I conducted below showed rejecting the null hypothesis at \(\alpha\) level \(0.05\) indicating that there are statistically significant monotonic relationship between the scores I tested. And here, setting the \(\alpha\) level at \(0.05\) means there is less than a \(5\%\) chance that the strength of the relationship I found happened by chance if the null hypothesis were true.

Speaking Scores

HRSPEAK & EMSPEAKING


    Spearman's rank correlation rho

data:  dat1.t1.std$HRSPEAK and dat1.t1.std$EMSPEAKING
S = 97509, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.7143656 

The rank order relationship between human rated and Emmersion speaking scores was about \(0.7\), which is fairly high correlation showing that Emmersion speaking score represents well with what human rated speaking score are representing.

HRSPEAK & Combined score


    Spearman's rank correlation rho

data:  dat1.t1.std$HRSPEAK and dat1.t1.std$CombinedPlacementTestBattery
S = 81121, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
     rho 
0.762371 

The correlation between the human rated speaking score and combined score was fairly high as well.

EMSPEAKING & EM total score


    Spearman's rank correlation rho

data:  dat1.t1.std$EMSPEAKING and dat1.t1.std$EM.total.score
S = 62574, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.8167018 

The relationship between the Emmersion speaking score and Emmersion total score was even stronger, which was about \(0.82\). This means that the higher an examinee ranked in Emmersion Speaking, the higher the examinee ranked in Emmersion total score, and vice versa.

Writing Scores

HRWRITE & EMWRITING


    Spearman's rank correlation rho

data:  dat1.t1.std$HRWRITE and dat1.t1.std$EMWRITING
S = 131783, p-value = 1.646e-14
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.6139657 

The rank order relationship between human rated writing score and Emmersion writing score was about \(0.6\), which is moderately high.

HRWRITE & Combined score


    Spearman's rank correlation rho

data:  dat1.t1.std$HRWRITE and dat1.t1.std$CombinedPlacementTestBattery
S = 107384, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.6854369 

The correlation between the human rated writing score and combined score was moderately high as well.

EMWRITING & EM total score


    Spearman's rank correlation rho

data:  dat1.t1.std$EMWRITING and dat1.t1.std$EM.total.score
S = 57923, p-value < 2.2e-16
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.8303254 

The relationship between the Emmersion writing score and Emmersion total score was very strong, which was about \(0.83\). Overall, the results indicated that the rank on each Emmersion speaking and writing score is more highly consistent with the rank on Emmersion total score.

4. Paired Data Plots

Next I evaluated how examinees are distributed with each score and how the distribution of the examinees changes when different scores are used. I visualized this into plots using paired data.

HRSPEAK & EMSPEAKING

In this plot, two different scores are on the x-axis, which are human rated speaking score on the left and Emmersion speaking score on the right. And you could see standardized scores are on the y-axis.

The human rated scores are more unevenly distributed as you could see there are many thick points where examinees are clustered. This indicates that it is difficult to discriminate examinees in terms of their English speaking ability when human rated scores are used. On the contrary, you could see that the Emmersion speaking scores are evenly distributed across the score range indicating that the Emmersion score is much more useful in discriminating examinees.

HRWRITE & EMWRITING

Very similar patterns were observed with the writing scores.

Combined Score & EM Total Score

Lastly I also examined the difference between the combined scores and Emmersion total score. The Emmersion total score showed less examinees on extreme scores and more examinees around the average compared to the combined score. This indicates that the Emmersion total score follows more close with the normal distribution and better functioning in assessing English abilities of examinees.

In summary, the following evidence could be used to support the use of Emmersion Speaking and Writing assessments in place of human rated assessments:

  • Emmersion speaking and writing scores and total score are more normally distributed representing better with the overall English ability of examinees.

  • The relationship between the Emmersion score and human rated score were fairly strong indicating the Emmersion scores are doing the same job in terms of discriminating examinees for their overall English ability.

  • Much stronger relationship was found between the Emmersion speaking and writing score and Emmersion total score compared to the relationship between human rated speaking and writing score and combined score. This result implies that the rank order of the examinees aligns very well with the total score when Emmersion speaking and writing assessments were used.

  • Emmersion speaking and writing assessments are better in discriminating examinees compared to the human rated assessments. This finding is especially important regarding the fact that these scores are used to inform examinees on which level of the class they will be assigned.

TASK 2

  • Second task was on evaluating six additional English speaking scores and show which contribute most meaningfully to a person’s English speaking ability.

1. Data Checking

2. Descriptive Statistics

type n.students Mean SD Min Max Range Skewness
FLUENCY1 127 0 1 -3.36 1.75 5.11 -0.97
FLUENCY2 127 0 1 -2.60 2.70 5.30 -0.09
PRONUN1 127 0 1 -2.43 2.32 4.75 -0.30
PRONUN2 127 0 1 -1.47 3.62 5.09 1.53
VOCAB 127 0 1 -0.94 5.03 5.97 2.92
SENTENCEMASTERY 127 0 1 -2.90 1.43 4.33 -0.81
EMSPEAKING 127 0 1 -2.00 2.61 4.61 0.23
HRSPEAK 127 0 1 -2.45 1.92 4.37 -0.25

First I checked the descriptive statistics of each score. Again, all scores are standardized with mean \(0\) and standard deviation \(1\). Some variables such as PRONOUN2 and VOCAB are highly skewed and it is shown in this boxplot as well.

Boxplots

3. Correlation

                FLUENCY1 FLUENCY2  PRONUN1  PRONUN2    VOCAB SENTENCEMASTERY EMSPEAKING
FLUENCY1                                                                               
FLUENCY2         0.89***                                                               
PRONUN1           0.22*   0.28**                                                       
PRONUN2          0.39***  0.54***  0.55***                                             
VOCAB            0.30***  0.42***  0.38***  0.90***                                    
SENTENCEMASTERY  0.47***  0.62***  0.44***  0.73***  0.49***                           
EMSPEAKING       0.50***  0.64***  0.43***  0.90***  0.77***         0.87***           
HRSPEAK          0.52***  0.61***  0.31***  0.63***  0.49***         0.71***    0.72***

The correlation table shows that PRONOUN2, VOCAB, and SENTENCEMASTERY scores are fairly highly correlated with Emmersion Speaking score and only SENTENCEMASTERY score was fairly highly correlated with the human rated speaking score.

4. Linear Regression

In order to investigate which score variable contributes the most to examinees’ English speaking ability, I ran linear regression models using 6 additional speaking scores as dependent variables and human rated speaking score and Emmersion speaking score independent variables.

Outcome variable: HRSPEAK


Call:
lm(formula = HRSPEAK ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + 
    VOCAB + SENTENCEMASTERY, data = dat1.t2.std)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.62113 -0.52201 -0.05416  0.46225  1.71278 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -7.878e-16  5.925e-02   0.000 1.000000    
FLUENCY1         1.677e-01  1.352e-01   1.240 0.217233    
FLUENCY2         7.477e-02  1.531e-01   0.488 0.626218    
PRONUN1         -8.284e-02  7.538e-02  -1.099 0.273996    
PRONUN2          2.601e-01  2.363e-01   1.101 0.273281    
VOCAB           -4.519e-03  1.757e-01  -0.026 0.979526    
SENTENCEMASTERY  4.314e-01  1.145e-01   3.768 0.000257 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6678 on 120 degrees of freedom
Multiple R-squared:  0.5753,    Adjusted R-squared:  0.5541 
F-statistic: 27.09 on 6 and 120 DF,  p-value: < 2.2e-16

SENTENCEMASTERY score was the only statistically significant variable in in predicting human rated speaking score. The SENTENCEMASTERY score was positively associated with the human rated speaking score, after controlling all other variables. About \(57.5\%\) of the variation in human rated speaking score was explained by the model.

Outcome variable: EMSPEAKING


Call:
lm(formula = EMSPEAKING ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + 
    VOCAB + SENTENCEMASTERY, data = dat1.t2.std)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.88738 -0.17731 -0.02349  0.13919  0.84548 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -7.308e-16  2.447e-02   0.000 1.000000    
FLUENCY1         7.050e-02  5.582e-02   1.263 0.208989    
FLUENCY2         5.545e-04  6.322e-02   0.009 0.993017    
PRONUN1         -7.969e-02  3.112e-02  -2.561 0.011690 *  
PRONUN2          3.359e-01  9.756e-02   3.443 0.000792 ***
VOCAB            2.196e-01  7.255e-02   3.027 0.003026 ** 
SENTENCEMASTERY  5.215e-01  4.727e-02  11.032  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2757 on 120 degrees of freedom
Multiple R-squared:  0.9276,    Adjusted R-squared:  0.924 
F-statistic: 256.2 on 6 and 120 DF,  p-value: < 2.2e-16

PRONUN1, PRONUN2, VOCAB, and SENTENCEMASTERY scores were statistically significant in predicting Emmersion speaking score. PRONUN1 score was negatively associated with the Emmersion speaking score, after controlling all other variables. About \(92.8\%\) of the variation in Emmersion speaking score was explained by the model.

5. Stepwise Regression Using Backward Selection

I further ran stepwise regression using backward selection method to find the subset of score variables resulting in the best performing model. Backward selection means that I start with all predictors in the model, which I call the full model, and iteratively removes the least contributing predictors, and stops when I have a model where all predictors are statistically significant.

Outcome variable: EMSPEAKING

Single term deletions

Model:
EMSPEAKING ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + VOCAB + 
    SENTENCEMASTERY
                Df Sum of Sq     RSS     AIC  F value    Pr(>F)    
<none>                        9.1222 -320.45                       
FLUENCY1         1    0.1213  9.2435 -320.77   1.5955 0.2089893    
FLUENCY2         1    0.0000  9.1222 -322.45   0.0001 0.9930173    
PRONUN1          1    0.4984  9.6206 -315.70   6.5566 0.0116897 *  
PRONUN2          1    0.9014 10.0236 -310.48  11.8572 0.0007917 ***
VOCAB            1    0.6964  9.8186 -313.11   9.1614 0.0030255 ** 
SENTENCEMASTERY  1    9.2519 18.3741 -233.52 121.7058 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On the first run, FLUENCY2 was the least significant predictor in the model so it was deleted in the next model.

Single term deletions

Model:
EMSPEAKING ~ FLUENCY1 + PRONUN1 + PRONUN2 + VOCAB + SENTENCEMASTERY
                Df Sum of Sq     RSS     AIC  F value    Pr(>F)    
<none>                        9.1222 -322.45                       
FLUENCY1         1    0.4871  9.6093 -317.84   6.4615 0.0122868 *  
PRONUN1          1    0.5015  9.6237 -317.66   6.6517 0.0111035 *  
PRONUN2          1    0.9074 10.0296 -312.41  12.0365 0.0007235 ***
VOCAB            1    0.6964  9.8187 -315.11   9.2379 0.0029058 ** 
SENTENCEMASTERY  1    9.8359 18.9581 -231.55 130.4668 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After eliminating FLUENCY2 variable in the model, now all predictors are significantly contributing in predicting Emmersion speaking score.


Call:
lm(formula = EMSPEAKING ~ FLUENCY1 + PRONUN1 + PRONUN2 + VOCAB + 
    SENTENCEMASTERY, data = dat1.t2.std)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.88734 -0.17742 -0.02338  0.13924  0.84536 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -7.308e-16  2.436e-02   0.000 1.000000    
FLUENCY1         7.093e-02  2.790e-02   2.542 0.012287 *  
PRONUN1         -7.971e-02  3.091e-02  -2.579 0.011103 *  
PRONUN2          3.360e-01  9.685e-02   3.469 0.000724 ***
VOCAB            2.196e-01  7.225e-02   3.039 0.002906 ** 
SENTENCEMASTERY  5.216e-01  4.567e-02  11.422  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.2746 on 121 degrees of freedom
Multiple R-squared:  0.9276,    Adjusted R-squared:  0.9246 
F-statistic: 310.1 on 5 and 121 DF,  p-value: < 2.2e-16

The \(R^2\) of the final model is \(92.8\%\) and the value is same as the \(R^2\) of the full model. This indicates that the FLUENCY2 score is not significantly contributing in predicting Emmersion speaking score. The finding is consistent that SENTENCEMASTERY score is the most contributing variable in predicting Emmersion speaking score.

Outcome variable: HRSPEAK

Single term deletions

Model:
HRSPEAK ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + VOCAB + SENTENCEMASTERY
                Df Sum of Sq    RSS     AIC F value    Pr(>F)    
<none>                       53.510 -95.769                      
FLUENCY1         1    0.6861 54.196 -96.151  1.5387 0.2172332    
FLUENCY2         1    0.1063 53.616 -97.517  0.2385 0.6262182    
PRONUN1          1    0.5385 54.048 -96.497  1.2077 0.2739960    
PRONUN2          1    0.5401 54.050 -96.493  1.2113 0.2732812    
VOCAB            1    0.0003 53.510 -97.768  0.0007 0.9795260    
SENTENCEMASTERY  1    6.3300 59.840 -83.569 14.1956 0.0002568 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I ran the same stepwise regression, now with the human rated speaking score as an outcome variable. In the full model, only the SENTENCEMASTERY score is statistically significant and VOCAB is the least contributing predictor. So in the next model, I removed VOCAB score.

Single term deletions

Model:
HRSPEAK ~ FLUENCY1 + FLUENCY2 + PRONUN1 + PRONUN2 + SENTENCEMASTERY
                Df Sum of Sq    RSS     AIC F value    Pr(>F)    
<none>                       53.510 -97.768                      
FLUENCY1         1    0.6858 54.196 -98.151  1.5508  0.215417    
FLUENCY2         1    0.1063 53.616 -99.516  0.2404  0.624820    
PRONUN1          1    0.5897 54.100 -98.376  1.3335  0.250467    
PRONUN2          1    3.1340 56.644 -92.540  7.0867  0.008821 ** 
SENTENCEMASTERY  1    9.1386 62.649 -79.744 20.6647 1.307e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this model the least contributing predictor is FLUENCY2 score so I removed this variable in the next model.

Single term deletions

Model:
HRSPEAK ~ FLUENCY1 + PRONUN1 + PRONUN2 + SENTENCEMASTERY
                Df Sum of Sq    RSS      AIC F value    Pr(>F)    
<none>                       53.616  -99.516                      
FLUENCY1         1    4.9013 58.518  -90.407 11.1526  0.001114 ** 
PRONUN1          1    0.6341 54.251 -100.023  1.4429  0.231991    
PRONUN2          1    3.5104 57.127  -93.462  7.9876  0.005505 ** 
SENTENCEMASTERY  1   10.6053 64.222  -78.594 24.1315 2.827e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now PRONUN1 is the least contributing predictor.

Single term deletions

Model:
HRSPEAK ~ FLUENCY1 + PRONUN2 + SENTENCEMASTERY
                Df Sum of Sq    RSS      AIC F value    Pr(>F)    
<none>                       54.251 -100.023                      
FLUENCY1         1    4.9678 59.218  -90.895 11.2632  0.001052 ** 
PRONUN2          1    2.8900 57.141  -95.432  6.5523  0.011686 *  
SENTENCEMASTERY  1   10.2794 64.530  -79.986 23.3060 4.016e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After removing PRONUN1 score from the model, now all predictors are significantly predicting human rated speaking score.


Call:
lm(formula = HRSPEAK ~ FLUENCY1 + PRONUN2 + SENTENCEMASTERY, 
    data = dat1.t2.std)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.80287 -0.49203 -0.05194  0.43464  1.69636 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -7.468e-16  5.893e-02   0.000  1.00000    
FLUENCY1         2.262e-01  6.739e-02   3.356  0.00105 ** 
PRONUN2          2.228e-01  8.702e-02   2.560  0.01169 *  
SENTENCEMASTERY  4.384e-01  9.081e-02   4.828 4.02e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6641 on 123 degrees of freedom
Multiple R-squared:  0.5694,    Adjusted R-squared:  0.5589 
F-statistic: 54.22 on 3 and 123 DF,  p-value: < 2.2e-16

The final model had \(R^2\) equal to \(0.60\). The initial full model had \(R^2\) of $ 0.58$, meaning that only \(2\%\) of the variance was reduces after eliminating three variables from the model. Again, the most contributing score is SENTENCEMASTERY.

The result showed that the SENTENCEMASTERY score contributed the most not only to the human rated speaking score but also Emmersion speaking score. This suggests that using SENTENCEMASTERY score would supplement any detail lost after not using human rated speaking assessment.

TASK 3

Third task was on first, analyzing data with item response theory (IRT) models, second, identifying the best fitting model, and last, recommending \(25\) to \(30\) items that would best perform on a new assessment.

1. Data Checking

The item response data consists of 60 items and 151 examinees.

Renaming Columns

n = 60
for (i in 1:n){
    names(dat2)[i+1] = paste("Item", i, sep = "")
    print(names(dat2))
}

First, I renamed items with Item 1 through Item 60.

Check Missing Values

Total number of missing cases

Number and percent of missing by item

Number and percent of missing by examinee

Deleting missing data

[1] 139

About \(1\%\) of the total cases was missing. On average, about \(1\%\) of the examinees was missing across items and about \(1\%\) of the responses was missing across examinees. This was very small number of missing data. So I conducted listwise deletion, and total 12 rows were removed.

2. Descriptive Statistics

vars n mean sd median trimmed mad min max range skew kurtosis se
Item1 1 139 2.54 0.74 3 2.69 0.00 0 3 3 -1.55 1.66 0.06
Item2 2 139 1.57 1.04 2 1.58 1.48 0 3 3 -0.28 -1.12 0.09
Item3 3 139 2.08 0.89 2 2.17 1.48 0 3 3 -0.64 -0.48 0.08
Item4 4 139 2.03 1.04 2 2.15 1.48 0 3 3 -0.60 -0.96 0.09
Item5 5 139 0.93 0.72 1 0.88 0.00 0 3 3 0.45 0.03 0.06
Item6 6 139 2.27 0.84 2 2.38 1.48 0 3 3 -0.91 -0.05 0.07
Item7 7 139 1.81 1.19 2 1.88 1.48 0 3 3 -0.31 -1.49 0.10
Item8 8 139 2.55 0.70 3 2.68 0.00 0 3 3 -1.74 3.06 0.06
Item9 9 139 1.68 1.12 2 1.73 1.48 0 3 3 -0.07 -1.43 0.09
Item10 10 139 2.10 1.03 2 2.24 1.48 0 3 3 -0.71 -0.84 0.09
Item11 11 139 2.10 0.94 2 2.19 1.48 0 3 3 -0.56 -0.93 0.08
Item12 12 139 2.45 0.77 3 2.58 0.00 0 3 3 -1.32 1.14 0.07
Item13 13 139 2.05 0.92 2 2.17 1.48 0 3 3 -0.77 -0.22 0.08
Item14 14 139 1.69 0.96 2 1.73 1.48 0 3 3 -0.18 -0.96 0.08
Item15 15 139 2.32 0.79 2 2.42 1.48 0 3 3 -0.97 0.30 0.07
Item16 16 139 1.72 1.16 2 1.77 1.48 0 3 3 -0.25 -1.43 0.10
Item17 17 139 2.21 1.01 3 2.35 0.00 0 3 3 -0.84 -0.72 0.09
Item18 18 139 1.94 0.94 2 1.97 1.48 0 3 3 -0.18 -1.30 0.08
Item19 19 139 2.60 0.68 3 2.72 0.00 0 3 3 -1.94 4.11 0.06
Item20 20 139 2.60 0.63 3 2.69 0.00 0 3 3 -1.81 4.01 0.05
Item21 21 139 2.36 0.86 3 2.50 0.00 0 3 3 -1.16 0.40 0.07
Item22 22 139 2.54 0.69 3 2.65 0.00 0 3 3 -1.69 3.08 0.06
Item23 23 139 1.32 1.17 1 1.28 1.48 0 3 3 0.34 -1.38 0.10
Item24 24 139 2.34 0.82 3 2.47 0.00 0 3 3 -1.15 0.72 0.07
Item25 25 139 0.94 0.61 1 0.92 0.00 0 3 3 0.22 0.33 0.05
Item26 26 139 1.77 1.00 2 1.83 1.48 0 3 3 -0.30 -1.02 0.09
Item27 27 139 2.47 0.75 3 2.61 0.00 0 3 3 -1.50 2.02 0.06
Item28 28 139 1.57 0.96 1 1.58 1.48 0 3 3 0.17 -1.04 0.08
Item29 29 139 1.42 0.95 1 1.41 1.48 0 3 3 0.29 -0.86 0.08
Item30 30 139 2.56 0.65 3 2.65 0.00 0 3 3 -1.64 3.25 0.06
Item31 31 139 1.52 1.04 1 1.52 1.48 0 3 3 0.09 -1.19 0.09
Item32 32 139 2.41 0.87 3 2.58 0.00 0 3 3 -1.42 1.14 0.07
Item33 33 139 1.40 0.98 1 1.38 1.48 0 3 3 0.20 -0.97 0.08
Item34 34 139 2.42 0.89 3 2.59 0.00 0 3 3 -1.41 0.91 0.08
Item35 35 139 0.76 0.79 1 0.68 1.48 0 3 3 0.63 -0.60 0.07
Item36 36 139 2.46 0.75 3 2.59 0.00 0 3 3 -1.58 2.49 0.06
Item37 37 139 1.36 1.08 1 1.33 1.48 0 3 3 0.22 -1.23 0.09
Item38 38 139 0.94 0.94 1 0.82 1.48 0 3 3 0.74 -0.37 0.08
Item39 39 139 1.54 1.09 1 1.55 1.48 0 3 3 0.15 -1.34 0.09
Item40 40 139 1.14 1.07 1 1.06 1.48 0 3 3 0.55 -0.97 0.09
Item41 41 139 1.35 0.99 1 1.31 1.48 0 3 3 0.38 -0.90 0.08
Item42 42 139 1.35 0.92 1 1.32 0.00 0 3 3 0.46 -0.65 0.08
Item43 43 139 1.51 1.12 1 1.51 1.48 0 3 3 0.08 -1.38 0.09
Item44 44 139 1.35 1.23 1 1.31 1.48 0 3 3 0.25 -1.55 0.10
Item45 45 139 1.05 1.07 1 0.95 1.48 0 3 3 0.69 -0.79 0.09
Item46 46 139 2.78 0.60 3 2.93 0.00 0 3 3 -3.30 11.34 0.05
Item47 47 139 1.12 1.00 1 1.04 1.48 0 3 3 0.54 -0.76 0.08
Item48 48 139 0.97 0.95 1 0.87 1.48 0 3 3 0.61 -0.65 0.08
Item49 49 139 0.88 0.93 1 0.78 1.48 0 3 3 0.67 -0.65 0.08
Item50 50 139 1.12 0.83 1 1.10 1.48 0 3 3 0.17 -0.80 0.07
Item51 51 139 1.33 0.97 1 1.29 1.48 0 3 3 0.34 -0.87 0.08
Item52 52 139 0.92 0.70 1 0.87 0.00 0 3 3 0.61 0.65 0.06
Item53 53 139 1.14 0.80 1 1.12 1.48 0 3 3 0.26 -0.49 0.07
Item54 54 139 0.99 0.88 1 0.91 1.48 0 3 3 0.54 -0.50 0.07
Item55 55 139 0.85 0.77 1 0.77 1.48 0 3 3 0.73 0.31 0.07
Item56 56 139 1.19 1.07 1 1.12 1.48 0 3 3 0.44 -1.06 0.09
Item57 57 139 1.93 0.93 2 2.01 1.48 0 3 3 -0.45 -0.75 0.08
Item58 58 139 1.11 0.81 1 1.04 0.00 0 3 3 0.60 0.07 0.07
Item59 59 139 1.76 0.99 2 1.81 1.48 0 3 3 -0.03 -1.25 0.08
Item60 60 139 1.16 1.01 1 1.08 1.48 0 3 3 0.57 -0.75 0.09
[1] 1.705156
[1] 0.1548018

The minimum and maximum values of the item response are \(0\) and \(3\), respectively. This indicates there are four response categories for these items. The average response score was \(1.7\) and standard deviation was around \(0.2\)

3. Identify best fitting model

When the multiple categories types of item response data is used, polytomous item response theory models need be applied. In order to identify the best fitting model, I applied five item response theory models that include include partial credit model, generalized partial credit model, rating scale model, graded response model, and nominal response model.

Partial Credit Model (PCM)

First I applied partial credit model. This model was developed to analyze test items that require multiple steps which allows to assign partial credit for completing several steps. This model is from a a family of the Rasch model meaning that it’s not estimating item discrimination parameters but only item category parameters.


Call:
mirt(data = dat2.item, model = 1, itemtype = "Rasch")

Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 129 EM iterations.
mirt version: 1.33.2 
M-step optimizer: nlminb 
EM acceleration: Ramsay 
Number of rectangular quadrature: 61
Latent density type: Gaussian 

Log-likelihood = -6077.361
Estimated parameters: 181 
AIC = 12516.72; AICc = 10984.54
BIC = 13047.86; SABIC = 12475.22
G2 (1e+10) = 10785.71, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
            M2   df p      RMSEA    RMSEA_5   RMSEA_95      SRMSR      TLI
stats 3038.552 1650 0 0.07809073 0.07346065 0.08213571 0.08425738 0.967646
           CFI
stats 0.967646

Generalized Partial Credit Model (GPCM)

Generalized partial credit model is an extended partial credit model that further estimates item discrimination parameter in addition to the item category parameters.


Call:
mirt(data = dat2.item, model = 1, itemtype = "gpcm")

Full-information item factor analysis with 1 factor(s).
FAILED TO CONVERGE within 1e-04 tolerance after 500 EM iterations.
mirt version: 1.33.2 
M-step optimizer: BFGS 
EM acceleration: Ramsay 
Number of rectangular quadrature: 61
Latent density type: Gaussian 

Log-likelihood = -5955.801
Estimated parameters: 240 
AIC = 12391.6; AICc = 11257.49
BIC = 13095.88; SABIC = 12336.57
G2 (1e+10) = 10542.59, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
            M2   df p      RMSEA    RMSEA_5   RMSEA_95      SRMSR       TLI
stats 3259.788 1590 0 0.08723535 0.08266685 0.09116961 0.05963919 0.9596249
            CFI
stats 0.9610931

Rating Scale Model (RSM)

Rating scale model is a constrained version of the partial credit model. This model was developed to measure rating scales such as Likert scales, which were assumed to function in the same way across all items in a test.


Call:
mirt(data = dat2.item, model = 1, itemtype = "rsm")

Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 116 EM iterations.
mirt version: 1.33.2 
M-step optimizer: nlminb 
EM acceleration: Ramsay 
Number of rectangular quadrature: 61
Latent density type: Gaussian 

Log-likelihood = -6503.316
Estimated parameters: 240 
AIC = 13132.63; AICc = 13240.15
BIC = 13317.5; SABIC = 13118.19
G2 (1e+10) = 11637.62, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
            M2   df p      RMSEA    RMSEA_5   RMSEA_95       TLI       CFI
stats 3884.297 1768 0 0.09313387 0.08884813 0.09674998 0.9539803 0.9506892

Graded Response Model (GRM)

Graded response model is appropriate when item responses are ordered categorical responses. This model is a generalization of the 2PL model so it estimates item discrimination parameters and category parameters.


Call:
mirt(data = dat2.item, model = 1, itemtype = "graded", SE = TRUE)

Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 328 EM iterations.
mirt version: 1.33.2 
M-step optimizer: BFGS 
EM acceleration: Ramsay 
Number of rectangular quadrature: 61
Latent density type: Gaussian 

Information matrix estimated with method: Oakes
Second-order test: model is a possible local maximum
Condition number of information matrix =  361.1029

Log-likelihood = -5862.487
Estimated parameters: 240 
AIC = 12204.97; AICc = 11070.86
BIC = 12909.25; SABIC = 12149.94
G2 (1e+10) = 10355.96, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
            M2   df p      RMSEA    RMSEA_5   RMSEA_95      SRMSR       TLI
stats 3492.428 1590 0 0.09311421 0.08861305 0.09694605 0.05807705 0.9539929
            CFI
stats 0.9556659

Nominal Response Model (NRM)

Nominal response model can be used to item responses that are not strictly ordered but also can be used to ordered responses. This model is considered the most general polytomous item response theory model.


Call:
mirt(data = dat2.item, model = 1, itemtype = "nominal")

Full-information item factor analysis with 1 factor(s).
Converged within 1e-04 tolerance after 100 EM iterations.
mirt version: 1.33.2 
M-step optimizer: BFGS 
EM acceleration: Ramsay 
Number of rectangular quadrature: 61
Latent density type: Gaussian 

Log-likelihood = -5749.68
Estimated parameters: 360 
AIC = 12219.36; AICc = 11048.55
BIC = 13275.77; SABIC = 12136.81
G2 (1e+10) = 10130.35, p = 1
RMSEA = 0, CFI = NaN, TLI = NaN
            M2   df p      RMSEA    RMSEA_5   RMSEA_95       TLI       CFI
stats 3291.605 1470 0 0.09476084 0.09011006 0.09873236 0.9523563 0.9575538

Model-data Fit Comparison


Model 1: mirt(data = dat2.item, model = 1, itemtype = "rsm")
Model 2: mirt(data = dat2.item, model = 1, itemtype = "Rasch")
       AIC     AICc    SABIC       HQ      BIC    logLik      X2  df   p
1 13132.63 13240.15 13118.19 13207.76 13317.50 -6503.316     NaN NaN NaN
2 12516.72 10984.54 12475.22 12732.56 13047.86 -6077.361 851.908 118   0

Model 1: mirt(data = dat2.item, model = 1, itemtype = "Rasch")
Model 2: mirt(data = dat2.item, model = 1, itemtype = "gpcm")
       AIC     AICc    SABIC       HQ      BIC    logLik     X2  df   p
1 12516.72 10984.54 12475.22 12732.56 13047.86 -6077.361    NaN NaN NaN
2 12391.60 11257.49 12336.57 12677.80 13095.88 -5955.801 243.12  59   0

For nested models, goodness-of-fit statistics such as difference \(\chi^2\) can be used to select better fitting models.

Model-data Fit Indices
Model LogLikelihood AIC BIC RMSEA CFI TLI
PCM -6077.36 12516.7 13047.9 0.078 0.97 0.97
GPCM -5955.80 12391.6 13095.9 0.087 0.96 0.96
RSM -6503.32 13132.6 13317.5 0.093 0.95 0.95
GRM -5862.49 12205.0 12909.3 0.093 0.96 0.95
NRM -5749.68 12219.4 13275.8 0.095 0.96 0.95

For non-nested model comparison, fit indices such as the log-likelihood, Akaike’s information criteria (AIC), and Bayesian information criteria can be used. The lower values of these three indices indicate a better comparative fit.

Other indices such as root mean squared error approximation (RMSEA), Tucker-Lewis Index (TLI), and comparative fit index (CFI) are also helpful in evaluating the model fit and judging the best fit model.

Values of the CFI and TLI higher than 0.9 are indicative of an acceptable fit, with values higher than 0.95 suggesting an excellent fit; values of the RMSEA less than 0.05 indicate close fit of a model to data and values between 0.05 and 0.08 reflect reasonable fit of a model.

After evaluating these fit indices of each model, I concluded that the graded response model is the best fitting model to the given response data.

4. Item recommendation Using Best-fittng Model

To recommend about 25 to 30 items, I fitted graded response model to obtain item parameter estimates, standardized factor loadings, category response curves, and item information curves.

Item parameter estimates

Item parameter estimates and standardized factor loadings for all 60 items were extracted and saved in a separate csv file.

Item Plots

Check Item-Fit Statistics

Summary of Item-Fit Statistics
item Zh outfit z.outfit infit z.infit X2 df.X2 RMSEA.X2 p.X2 G2 df.G2 RMSEA.G2 p.G2 S_X2 df.S_X2 RMSEA.S_X2 p.S_X2
Item1 0.21 0.59 -0.44 0.97 -0.09 2.45 2 0.04 0.29 1.72 2 0.00 0.42 13.08 14 0.00 0.52
Item2 0.31 0.89 -0.48 0.97 -0.15 8.04 5 0.07 0.15 6.35 4 0.07 0.17 19.81 19 0.02 0.41
Item3 0.13 2.94 2.89 0.95 -0.28 1.79 2 0.00 0.41 0.82 2 0.00 0.66 12.31 15 0.00 0.66
Item4 -0.16 1.78 1.18 1.02 0.15 16.36 5 0.13 0.01 19.73 5 0.15 0.00 23.93 19 0.04 0.20
Item5 0.03 1.03 0.24 1.04 0.36 5.02 5 0.01 0.41 3.76 5 0.00 0.58 29.46 25 0.04 0.25
Item6 0.02 1.01 0.21 0.97 -0.17 4.76 5 0.00 0.45 8.60 5 0.07 0.13 21.99 18 0.04 0.23
Item7 0.16 0.76 -0.19 0.99 0.01 8.71 3 0.12 0.03 12.30 2 0.19 0.00 14.03 15 0.00 0.52
Item8 0.13 0.77 -0.43 0.92 -0.33 7.33 4 0.08 0.12 6.40 4 0.07 0.17 22.30 19 0.04 0.27
Item9 0.16 1.36 0.81 0.93 -0.45 5.14 3 0.07 0.16 7.53 3 0.10 0.06 24.47 19 0.05 0.18
Item10 0.32 0.65 -0.50 0.98 -0.09 2.00 2 0.00 0.37 2.04 2 0.01 0.36 14.14 17 0.00 0.66
Item11 0.35 0.65 -0.67 0.96 -0.23 3.08 3 0.01 0.38 2.78 3 0.00 0.43 15.71 16 0.00 0.47
Item12 0.06 1.03 0.18 0.92 -0.47 12.15 5 0.10 0.03 17.52 5 0.13 0.00 28.91 23 0.04 0.18
Item13 -0.11 1.01 0.16 1.04 0.33 13.50 4 0.13 0.01 12.13 4 0.12 0.02 20.47 18 0.03 0.31
Item14 -0.01 1.00 0.02 0.95 -0.37 27.12 9 0.12 0.00 28.27 9 0.12 0.00 50.19 25 0.09 0.00
Item15 0.06 0.88 -0.03 1.02 0.17 2.40 4 0.00 0.66 5.30 4 0.05 0.26 10.07 16 0.00 0.86
Item16 0.23 0.88 -0.02 0.92 -0.48 2.34 4 0.00 0.67 4.73 4 0.04 0.32 18.36 18 0.01 0.43
Item17 0.41 0.61 -0.78 0.93 -0.36 5.73 2 0.12 0.06 5.90 2 0.12 0.05 17.06 13 0.05 0.20
Item18 0.18 0.78 -0.31 0.97 -0.16 1.54 4 0.00 0.82 3.26 4 0.00 0.52 14.33 16 0.00 0.57
Item19 0.06 0.75 -0.15 0.99 0.01 4.15 3 0.05 0.25 3.85 3 0.05 0.28 15.12 16 0.00 0.52
Item20 0.24 0.56 -0.56 0.98 -0.04 7.77 1 0.22 0.01 9.04 1 0.24 0.00 25.92 15 0.07 0.04
Item21 -0.07 16.72 10.03 0.97 -0.13 4.17 6 0.00 0.65 6.25 6 0.02 0.40 12.30 19 0.00 0.87
Item22 -0.07 1.47 0.97 1.10 0.54 6.04 3 0.09 0.11 8.46 3 0.11 0.04 15.44 18 0.00 0.63
Item23 0.19 0.83 -0.37 0.99 0.01 8.82 5 0.07 0.12 8.38 5 0.07 0.14 23.79 21 0.03 0.30
Item24 -0.06 1.60 1.06 0.92 -0.47 4.14 3 0.05 0.25 7.68 3 0.11 0.05 19.74 17 0.03 0.29
Item25 0.15 0.93 -0.39 0.97 -0.19 9.27 4 0.10 0.05 10.86 4 0.11 0.03 16.53 19 0.00 0.62
Item26 -0.09 1.08 0.50 0.96 -0.30 14.75 7 0.09 0.04 21.04 7 0.12 0.00 38.48 26 0.06 0.05
Item27 0.11 0.75 -0.37 1.00 0.08 6.61 4 0.07 0.16 9.59 4 0.10 0.05 31.37 18 0.07 0.03
Item28 0.05 1.02 0.16 1.07 0.51 4.75 3 0.06 0.19 -1.44 3 0.00 1.00 18.04 21 0.00 0.65
Item29 0.07 1.07 0.45 1.09 0.66 4.40 5 0.00 0.49 0.29 5 0.00 1.00 22.98 22 0.02 0.40
Item30 0.10 0.69 -0.31 1.01 0.13 2.78 2 0.05 0.25 2.34 2 0.04 0.31 18.76 16 0.04 0.28
Item31 -0.10 1.17 0.78 1.08 0.62 11.41 6 0.08 0.08 8.70 6 0.06 0.19 22.78 24 0.00 0.53
Item32 0.24 0.63 -0.33 0.96 -0.15 4.44 3 0.06 0.22 6.73 3 0.09 0.08 9.92 14 0.00 0.77
Item33 0.18 0.93 -0.32 0.96 -0.25 18.18 7 0.11 0.01 17.84 7 0.11 0.01 35.96 23 0.06 0.04
Item34 0.16 0.64 -0.19 0.99 0.02 13.31 5 0.11 0.02 15.83 5 0.13 0.01 24.20 15 0.07 0.06
Item35 -0.01 1.05 0.30 1.06 0.51 4.75 4 0.04 0.31 2.23 4 0.00 0.69 25.69 20 0.05 0.18
Item36 0.05 0.82 -0.08 1.00 0.07 0.62 2 0.00 0.73 1.35 2 0.00 0.51 13.35 13 0.01 0.42
Item37 0.27 0.88 -0.48 0.97 -0.21 3.44 4 0.00 0.49 2.06 4 0.00 0.72 21.76 22 0.00 0.47
Item38 0.14 0.98 -0.03 0.99 -0.04 9.62 6 0.07 0.14 7.78 6 0.05 0.25 25.39 24 0.02 0.38
Item39 0.03 1.04 0.24 0.96 -0.19 11.12 5 0.09 0.05 12.98 5 0.11 0.02 30.79 21 0.06 0.08
Item40 0.05 1.12 0.63 1.07 0.53 7.07 5 0.05 0.22 10.05 5 0.09 0.07 36.61 23 0.07 0.04
Item41 0.15 1.03 0.19 1.05 0.41 2.41 2 0.04 0.30 3.55 2 0.07 0.17 19.04 19 0.00 0.45
Item42 0.21 0.92 -0.33 0.96 -0.27 9.49 4 0.10 0.05 7.77 3 0.11 0.05 12.59 21 0.00 0.92
Item43 0.03 1.17 0.58 0.93 -0.47 4.98 3 0.07 0.17 6.88 3 0.10 0.08 39.71 19 0.09 0.00
Item44 0.20 0.83 -0.18 0.95 -0.23 2.62 3 0.00 0.45 2.17 3 0.00 0.54 13.49 16 0.00 0.64
Item45 0.23 0.81 -0.77 0.90 -0.66 29.11 6 0.17 0.00 25.78 4 0.20 0.00 41.63 20 0.09 0.00
Item46 0.09 0.48 -0.30 0.91 -0.20 3.17 0 NaN NaN 2.21 0 NaN NaN 9.75 6 0.07 0.14
Item47 0.27 0.87 -0.61 0.94 -0.39 4.49 4 0.03 0.34 7.16 4 0.08 0.13 26.33 21 0.04 0.19
Item48 -0.07 1.26 1.30 1.16 1.19 7.28 5 0.06 0.20 6.81 5 0.05 0.24 25.21 23 0.03 0.34
Item49 0.07 0.99 0.04 1.04 0.34 3.68 4 0.00 0.45 4.31 4 0.02 0.37 24.37 21 0.03 0.28
Item50 0.05 1.03 0.21 1.10 0.78 3.61 4 0.00 0.46 -0.98 4 0.00 1.00 26.66 22 0.04 0.22
Item51 0.21 0.90 -0.54 0.99 -0.07 3.23 5 0.00 0.66 4.22 5 0.00 0.52 23.17 20 0.03 0.28
Item52 0.07 1.01 0.13 0.98 -0.10 7.36 4 0.08 0.12 2.59 3 0.00 0.46 28.43 23 0.04 0.20
Item53 0.25 0.84 -0.67 1.03 0.25 7.01 2 0.13 0.03 6.25 2 0.12 0.04 18.47 15 0.04 0.24
Item54 0.13 0.96 -0.19 1.01 0.11 10.85 4 0.11 0.03 3.96 2 0.08 0.14 24.66 22 0.03 0.31
Item55 0.03 1.04 0.26 1.01 0.14 7.59 6 0.04 0.27 6.17 6 0.01 0.40 35.51 26 0.05 0.10
Item56 0.29 0.86 -0.41 1.04 0.35 3.33 3 0.03 0.34 2.45 3 0.00 0.49 25.17 20 0.04 0.20
Item57 0.02 1.11 0.50 0.97 -0.18 4.17 4 0.02 0.38 4.96 4 0.04 0.29 19.72 22 0.00 0.60
Item58 0.08 1.04 0.30 1.09 0.71 6.42 3 0.09 0.09 3.34 3 0.03 0.34 21.60 19 0.03 0.30
Item59 0.07 0.93 -0.14 0.96 -0.24 2.97 3 0.00 0.40 3.02 3 0.01 0.39 34.29 19 0.08 0.02
Item60 -0.01 1.32 1.66 0.97 -0.19 10.95 4 0.11 0.03 15.52 4 0.14 0.00 23.56 21 0.03 0.31
          V1 theta
255 3.570555 -1.46
          V1 theta
319 3.701036 -0.82
          V1 theta
301 5.561037    -1
          V1 theta
318 3.570588 -0.83
          V1 theta
526 1.004998  1.25
          V1 theta
278 2.939597 -1.23
          V1 theta
366 7.947952 -0.35
           V1 theta
150 0.8680328 -2.51
          V1 theta
392 4.012261 -0.09
         V1 theta
317 6.87392 -0.84
          V1 theta
315 7.456777 -0.86
           V1 theta
170 0.6458577 -2.31
          V1 theta
260 2.611754 -1.41
          V1 theta
343 1.476888 -0.58
          V1 theta
256 2.818058 -1.45
          V1 theta
351 4.911946  -0.5
          V1 theta
318 9.975786 -0.83
          V1 theta
347 3.890309 -0.54
          V1 theta
168 1.432936 -2.33
          V1 theta
182 2.758194 -2.19
          V1 theta
255 1.349933 -1.46
          V1 theta
158 1.182987 -2.43
          V1 theta
438 3.829325  0.37
          V1 theta
232 2.632179 -1.69
          V1 theta
533 1.119239  1.32
          V1 theta
327 1.284963 -0.74
          V1 theta
190 1.378351 -2.11
          V1 theta
398 2.550218 -0.03
         V1 theta
415 2.85268  0.14
          V1 theta
170 2.084035 -2.31
          V1 theta
392 2.056891 -0.09
          V1 theta
249 4.353934 -1.52
         V1 theta
401 2.23859     0
          V1 theta
258 3.233172 -1.43
          V1 theta
506 1.891507  1.05
          V1 theta
215 3.293347 -1.86
          V1 theta
402 3.291923  0.01
          V1 theta
507 1.719203  1.06
          V1 theta
423 2.492895  0.22
          V1 theta
471 2.756464   0.7
         V1 theta
428 4.29169  0.27
          V1 theta
442 1.996986  0.41
          V1 theta
383 4.229973 -0.18
          V1 theta
416 4.948164  0.15
          V1 theta
482 3.182825  0.81
          V1 theta
202 3.934302 -1.99
          V1 theta
446 3.756569  0.45
          V1 theta
469 2.021608  0.68
          V1 theta
465 2.243699  0.64
          V1 theta
439 2.011606  0.38
          V1 theta
418 3.377894  0.17
          V1 theta
541 1.556218   1.4
          V1 theta
440 5.567725  0.39
          V1 theta
463 2.516734  0.62
           V1 theta
556 0.9799728  1.55
          V1 theta
427 4.824898  0.26
         V1 theta
309 1.98861 -0.92
          V1 theta
487 2.431397  0.86
          V1 theta
379 2.383984 -0.22
          V1 theta
465 2.915979  0.64